Skip to content

Conversation

@jonathan-buttner
Copy link
Contributor

This PR refactors the throttled logging logic in a few ways. The new functionality is the following:

  • The first log message received is logged like previously
  • Subsequent messages that matched the first one are not logged (no matter the elapsed time), instead a counter is incremented
  • A thread is created when the Throttler class is initialized which runs on an interval (default is 1 hour) and will emit any messages that had been repeated
  • The repeated log messages include the number of skipped logs and the timestamp of the last log call that occurred

Testing

To test the logic, attempt to create an invalid openai inference endpoint

PUT http://localhost:9200/_inference/text_embedding/openai
{
    "service": "openai",
    "service_settings": {
        "api_key": "<api key>",
        "model_id": "text-embedding-3-small",
        "dimensions": 100,
        "url": "https://api.openai.com/v1/embeddingss" <----- use a custom url that is invalid (trailing s)
    },
    "task_settings": {
        
    }
}

This will cause a message to be emitted. If you send the same PUT request it will not be emitted. This is because the message matched so the counter will be incremented.

To force the message to be emitted we can update the logging interval to a shorter time like 5 seconds

PUT http://localhost:9200/_cluster/settings
{
  "transient": {
    "xpack.inference.logging.interval": "5s"
  }
}

After 5 seconds you should see the message.

Examples:

[2025-03-14T17:00:41,934][WARN ][o.e.x.i.e.h.s.TruncatingRequestManager] [runTask-0] Failed to process the response for request from inference entity id [openai] of type [openai text embedding] with status [404] [Not Found] org.elasticsearch.ElasticsearchStatusException: Received an unsuccessful status code for request from inference entity id [openai] status [404]. Error message: [Invalid URL (POST /v1/embeddingss)]
        at [email protected]/org.elasticsearch.xpack.inference.external.http.retry.BaseResponseHandler.buildError(BaseResponseHandler.java:112)
        at [email protected]/org.elasticsearch.xpack.inference.external.http.retry.BaseResponseHandler.buildError(BaseResponseHandler.java:105)
        at [email protected]/org.elasticsearch.xpack.inference.external.openai.OpenAiResponseHandler.checkForFailureStatusCode(OpenAiResponseHandler.java:90)
        at [email protected]/org.elasticsearch.xpack.inference.external.http.retry.BaseResponseHandler.validateResponse(BaseResponseHandler.java:80)
        at [email protected]/org.elasticsearch.xpack.inference.external.http.retry.RetryingHttpSender$InternalRetrier.lambda$tryAction$3(RetryingHttpSender.java:137)
        at [email protected]/org.elasticsearch.action.ActionListenerImplementations$DelegatingFailureActionListener.onResponse(ActionListenerImplementations.java:233)
        at [email protected]/org.elasticsearch.xpack.inference.external.http.HttpClient.lambda$respondUsingUtilityThread$1(HttpClient.java:140)
        at [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:977)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1575)

[2025-03-14T17:02:24,063][INFO ][o.e.c.s.ClusterSettings  ] [runTask-0] updating [xpack.inference.logging.interval] from [1h] to [5s]
[2025-03-14T17:02:29,069][WARN ][o.e.x.i.e.h.s.TruncatingRequestManager] [runTask-0] Failed to process the response for request from inference entity id [openai] of type [openai text embedding] with status [404] [Not Found], repeated 1 time, last message at [2025-03-14T21:02:13.080722Z] org.elasticsearch.ElasticsearchStatusException: Received an unsuccessful status code for request from inference entity id [openai] status [404]. Error message: [Invalid URL (POST /v1/embeddingss)]
        at [email protected]/org.elasticsearch.xpack.inference.external.http.retry.BaseResponseHandler.buildError(BaseResponseHandler.java:112)
        at [email protected]/org.elasticsearch.xpack.inference.external.http.retry.BaseResponseHandler.buildError(BaseResponseHandler.java:105)
        at [email protected]/org.elasticsearch.xpack.inference.external.openai.OpenAiResponseHandler.checkForFailureStatusCode(OpenAiResponseHandler.java:90)
        at [email protected]/org.elasticsearch.xpack.inference.external.http.retry.BaseResponseHandler.validateResponse(BaseResponseHandler.java:80)
        at [email protected]/org.elasticsearch.xpack.inference.external.http.retry.RetryingHttpSender$InternalRetrier.lambda$tryAction$3(RetryingHttpSender.java:137)
        at [email protected]/org.elasticsearch.action.ActionListenerImplementations$DelegatingFailureActionListener.onResponse(ActionListenerImplementations.java:233)
        at [email protected]/org.elasticsearch.xpack.inference.external.http.HttpClient.lambda$respondUsingUtilityThread$1(HttpClient.java:140)
        at [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:977)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1575)

@jonathan-buttner jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.19.0 v9.1.0 labels Mar 14, 2025

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.elasticsearch.common.Strings;
Copy link
Contributor Author

@jonathan-buttner jonathan-buttner Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would ignore the previous version of this class and review this like it's a new file. It has changed pretty significantly.

}

this.cancellableTask.set(startResetTask(threadPool));
public void init() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init() class are to avoid a very unlikely this leak if the object hadn't finished it construction yet.

final ReentrantReadWriteLock.WriteLock writeLock = lock.writeLock();

return threadPool.scheduleWithFixedDelay(logExecutors::clear, resetInterval, threadPool.executor(UTILITY_THREAD_POOL_NAME));
writeLock.lock();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locks so that we don't get inconsistencies while the logging the repeated messages. Otherwise the counts could be incorrect.

final String stringToAppend = messageToAppend;
return new LogExecutor(this.clock, 0, (message) -> executor.accept(message.concat(stringToAppend)));
private void log(String enrichedMessage) {
LogBuilder builder = throttledLogger.atLevel(level);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally tried constructing the LogBuilder in the manager class but it will fail to log because the LogBuilder log() method must be called on the constructing thread.

@jonathan-buttner jonathan-buttner marked this pull request as ready for review March 19, 2025 13:49
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@jonathan-buttner jonathan-buttner merged commit 2fd70d9 into elastic:main Mar 20, 2025
16 of 17 checks passed
@jonathan-buttner jonathan-buttner deleted the ml-improve-throttled-logging branch March 20, 2025 18:22
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

jonathan-buttner added a commit to jonathan-buttner/elasticsearch that referenced this pull request Mar 20, 2025
* Refactoring throttled logging

* Renaming setting

* Resetting log counter when attempt to log for repeated messages

* Clean up

* Refactoring locking

* Comments

* Working tests

* Addressing logbuilder issue

* Fixing tests
elasticsearchmachine pushed a commit that referenced this pull request Mar 20, 2025
…5338)

* [ML] Refactoring inference API throttled logging (#124923)

* Refactoring throttled logging

* Renaming setting

* Resetting log counter when attempt to log for repeated messages

* Clean up

* Refactoring locking

* Comments

* Working tests

* Addressing logbuilder issue

* Fixing tests

* Removing version annotations
afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Mar 21, 2025
* Refactoring throttled logging

* Renaming setting

* Resetting log counter when attempt to log for repeated messages

* Clean up

* Refactoring locking

* Comments

* Working tests

* Addressing logbuilder issue

* Fixing tests
smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Mar 21, 2025
* Refactoring throttled logging

* Renaming setting

* Resetting log counter when attempt to log for repeated messages

* Clean up

* Refactoring locking

* Comments

* Working tests

* Addressing logbuilder issue

* Fixing tests
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
* Refactoring throttled logging

* Renaming setting

* Resetting log counter when attempt to log for repeated messages

* Clean up

* Refactoring locking

* Comments

* Working tests

* Addressing logbuilder issue

* Fixing tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :ml Machine learning >non-issue Team:ML Meta label for the ML team v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants